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Abstract 

We study spatial embeddings of random graphs in which nodes are randomly 
distributed in geographical space. We let the edge probability between any two 
nodes to be dependent on the spatial distance between them and demonstrate 
that this model captures many generic properties of social networks, including 
the "small-world" properties, skewed degree distribution, and most distinctively 
the existence of community structures. 
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1 Introduction 

Complex social networks arise in a wide range of contexts, for example as corporate 
partnership networks [22], scientist collaboration networks jHOj, company director net- 
works [JT], film actors networks |3j, sexual contact networks |2H], etc. Indeed, a lot of 
attention has been given by both physical and social scientists in recent years to model 
these networks so as to gain better understandings of their general structures as well 
as their various functions like information flow [TH] . locating individuals PP, disease 
spread |2fij . etc. For a review of recent efforts, see for example jUI], |2] and [31]. While 
there is an apparent increase in the number of network models in the literature, not all 
of these models have taken full advantage of the sociological and psychological insights 
on how social networks may be formed. 

1.1 Spatial characteristics of social ties 

The principle of homophily, or in essence "birds of a feather flock together," has been 
firmly established by many empirical studies [21] • While we clearly tend to befriend 
those who are like us, there are many situations where having a lot of friends like us 
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is simply because we are stuck with people who are like us in the first place. For 
example if you are a millionaire and all your friends are millionaires, it might simply 
be because you were born into an elite family and live in an elite area so you only 
know millionaires in your life, even though you do not actively choose to befriend 
millionaires over non-millionaires. Therefore, it is useful to divide homophily into two 
main types: baseline homophily and inbreeding homophily [23]. Baseline homophily 
is attributed to the fact that we have a limited potential tie pool due to factors like 
demography and foci of activities ^Hj- Inbreeding homophily is conceptualised as any 
other kind of homophily measured over that potential tie pool — this may include 
homophily regarding gender, religion, social class, education, and other intra-personal 
or behavioural characteristics. While many network models have taken inbreeding 
homophily into account jHH 1121 ESI IH21 ESI EE! , they have generally assumed that there 
are no baseline homophily effects, i.e. the potential tie pool for all actors equals the 
entire population. However, this is obviously not very realistic and baseline homophily 
effects can potentially have profound consequences on the structure of social networks. 

A basic source of baseline homophily is the geographical space. As a matter of 
simple opportunity and/or the need to minimise efforts to form and maintain a social 
tie |54j . we can expect that we tend to form ties with those who are geographically 
close to us. Thus, intuitively, this creates a very strong constraint on our potential 
tie pool. In fact, there is ample empirical evidence that demonstrates this claim. The 
earliest studies of which we are aware of date back to Festinger et al. 14J and Caplow 
and Forman [Oj both on student housing communities. The results showed that in 
these rather homogeneous communities, spatial arrangement of student rooms/units 
was an important factor in predicting whether two dwellers have at least weak ties. 
Many other network studies also reached similar results, for example see [31 E]- More 
recently, Wellman [3E] and Mok et al. [21] re-analysed Wellman's earlier dataset on 
Torontorian personal communities |l9] 150] and noted that most personal friendships 
were indeed "local," contrary to the beliefs that recent technological advances have 
freed us from spatial constraints. For instance, in [3H] it was found that on average 
42% of "frequent contact" ties live within a mere 1 mile radius of a typical person, 
while the rest of his/her ties could be directed to anywhere in the rest of the world. 

1.2 General features of social networks 

Before embarking on specifying the model, we shall review some of the general features 
of social networks. Not many current models simultaneously displays all of these. We 
suggest that, by including baseline spatial homophily into our network model, one can 
reproduce all the following features, at least in broad terms: 

1. Low tie density. The number of possible ties in a network is theoretically 
quadratic to the number of actors, but most networks realise only a tiny fraction 
of these ties. The cognitive ability of human places an upper bound on the number 
of ties one may maintain [TI]. On the other hand, other factors corresponding to 
baseline homophily can also play a role [T3] : 

2. Short average geodesic distances. Geodesic distance between two actors is 
defined to be the length of the shortest connection between them. In large social 
networks, it is believed that the typical geodesic distance between any two actors 
remains small. This property was demonstrated empirically by Stanley Milgram 
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in his classical experiment in the 1960s [2H|, contributing to the popular saying 
that no one on this earth is separated from you by more than six "handshakes" ; 

3. High level of clustering. Clustering is defined to be the average probability 
that two friends of an actor are themselves friends. Equivalently, it is a measure 
of how having a mutual friend will heighten the conditional probability that the 
two friends of an actor will be friends themselves. In their well-known article |47j . 
Watts and Strogatz demonstrated the importance of short-cuts in social networks 
that simultaneously display high clustering and short average geodesic distances. 
Such an idea of short-cuts dates back to Granovetter's arguments on the strength 
of weak ties jTB] ; 

4. Positively skewed actor degree distribution. The degree of an actor is the 
number of social ties he/she has. In many social networks, a majority of actors 
have relatively small degrees, while a small number of actors may have very large 
degrees. This feature is displayed in a wide range of social networks. While it 
is still debated whether generic social networks have power-law, exponential, or 
other degree distributions, or indeed whether there is any generic distribution 
at all there is no doubt that degree distributions are in general positively 
skewed; 

5. Existence of communities. In many cases, clustering does not occur evenly 
over the entire network. We can often observed subgroups of actors who are highly 
connected within themselves but loosely connected to other subgroups which are 
themselves highly inter-connected. We call these highly-connected subgroups 
communities [2E]- A long tradition in social network analysis has developed a 
range of algorithms to identify these cohesive subsets of nodes |52j . 

An example of a social network that displays all of the above properties is a well- 
known alliance network of 16 tribes in the Eastern Central Highlands of New Guinea 
[39] 1 . The network is depicted in Fig. where nodes correspond to tribes and ties 
correspond to alliances between the relevant tribes. First of all, the density of the 
network is fairly low (p = 0.24) given the small size of the network. The degree 
distribution is positively skewed (skewness statistic = 0.99). The network is a "small 
world" in which it has low median geodesic distance (2) and high level of clustering 
coefficient (clustering coefficient = 0.63). Most importantly, two distinct communities 
can be easily observed in Fig. H one is disjoint from the rest and is fully connected 
(i.e. Nodes 1, 2, 15, and 16) and the other is highly connected within itself (i.e. Nodes 
3, 6, 7, 8, 11, and 12). 

1.3 Random graph models 

We here represent social networks by non-directed graphs. A non-directed graph is 
defined to be a pair (V, £), where V = {vi,v 2 , ■ ■ ■ ,vjy} is the node set denoting the 
individual actors in the network, and £ = {ex, e 2 , . . . , ejv/}, with each edge e; being an 
unordered pair of nodes = (v r ,r s ) (r ^ s and w rjS G V), is the edge set denoting 
the social ties among the actors. A compact way to represent a graph is through its 

lr Thc full data set is included as a sample data set in the standard social network analysis program 
UCINET, which is available at http://www.analytictech.com/ucinet.htm 
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adjacency matrix X = [x^], i,j G {1,2, ...,iV} such that = 1 iff (vi,Vj) G £, 
otherwise x^ = 0. In general, the size of the set V (or equivalently the dimensions 
of X) is fixed but whether an edge (v^VjY G £ (or equivalently Xij = 1 in X) is 
determined by a random process. Such a random process is defined so as to reflect the 
underlying social dynamics. 

The simplest model for social networks is the Erdos-Renyi or Bernoulli random 
graph model, initiated independently by Paul Erdos and Alfred Renyi ^2] and Anatol 
Rapaport [SB], where the random process is a Bernoulli trial, i.e. 

f 1, with probability p . . 

13 \ 0, with probability 1 — p ' 

where the constant p G [0, 1] is called the edge probability. In other words Xij are 
identically and independently distributed (i.i.d.) Bernoulli random variables. Due to 
its simplicity, it is amenable to rigorous treatments. It is fairly straightforward to show 
that the average geodesic distance between any two nodes is (ry) ~ In AT, a feature 
that, as discussed above, resembles the property in some real networks. However the 
level of clustering in this model can be shown to vanish as iV — > oo and it is one of 
the major short-comings of this simple model in modelling social networks. Further, 
Erdos and Renyi showed that there is a critical edge probability p c ~ iV -1 at which 
there always exists a connected component containing a significant proportion of nodes 
in the network (almost surely) [Zj. Such component is known as the giant component. 
For an extensive review of these results refer to Bollobas jZj and Janson et al. [20\. 

1.4 The overview 

The main advantage of the Erdos-Renyi model is its simplicity. Although it does not 
predict some of the generic features outlined above, it serves as a good foundation to 
build more realistic models. In this paper, we study a generalisation of Erdos-Renyi 
random graph model for social networks which incorporates a simple baseline spatial 
homophily effect in the formation of individual network ties. We note that this class 
of models is for a single snapshot of a network, thus temporal network dynamics are 
not taken into account. In Section 2, we shall specify the model and examine some of 
its basic properties. In Section 3, we shall outline our simulation methods and discuss 
the main results. In Section 4, we shall demonstrate the application of our model in a 
particular social network. In Section 5, we shall discuss the implications of the results 
and describe ongoing research on this and more generalised models. 

2 Spatial random graph model 

First of all, we embedded the nodes of graphs in the Euclidean space M 2 with a distance 
function d defined to map any unordered pair of nodes to a real number, i.e. d : 
M 2 x M 2 — > R. Further, d satisfies the standard triangle inequalities and the positivity 
condition. Every node Vi is assigned a coordinate (xj,?/,)* according to some spatial 
distribution and we define x — {{ x XiViY-, (^2,2/2)*, • • •} to be the location vector that 
specifics the spatial locations of all nodes in a network. Further we shall use the 
shorthand dij = d((xi,yiY, (xj,VjY) to denote the distance between nodes Vi and Vj. 
Next, we assume that the spatial locations of the nodes are randomly scattered in 
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space (and therefore all locations are mutually independent as well). In other words, 
we assumed that the points are distributed in space according to a homogeneous Poisson 
point process (see for example [El EI])- Recall that a Poisson point process with rate 
p < oo in the d-dimensional Euclidean space lR d is a process such that: 

• for all disjoint subsets A\, A2, . . . , Ak C M. d , the random variable denoting the 
number of point in the each subset, N(Ai), N(A 2 ), . . . , N(Af-), are independently 
distributed; 

• N(A) has Poisson distribution; and 

• E[N(A)} = p\A\ for all A C W l . 

To model the baseline homophily effect, we let the edge probability between two 
nodes to be dependent on the spatial distance between them. That is, given a Xi 
P(xij = l\x) = f(dij), where / : M — > [0, 1]. Motivated by the discussion in the last 
section, we shall consider the case where / is a simple step function, i.e. 



where p is the average density of the network, H is the neighbourhood radius, p& is the 
proximity bias which specifies the sensitivity to geographical space by the actors on 
establishing social links. Thus pb probabilistically controls the locations of potential tie 
pool. Of course we have assumed that such a spatial sensitivity is the same for all actors 
in the network. Further, A = A(p&, H\x) is some correction term. The correction term 
A is introduced to maintain the expected average density to be a constant p, given x, 
for all feasible values of H and p&, i.e. 



and there is no other substantive purpose. Without maintaining constant density, it 
will be difficult to isolate the effects of pb and H on the graph's structural properties 
because of the confounding effects from the varying expected number of edges in the 
model. 

To calculate A, we first determine the number of all possible edges shorter than 
the neighbourhood radius H in the network embedded in Xi ca U this number S<h{x)'i 
when N is sufficiently large (ignoring boundary effects), S<h(x) ~ Nirp 2 /2. The 
number of possible edges longer than the neighbourhood radius is therefore S > h(x) — 
C2) ~~ S<h(x)- By definition the expected density within neighbourhoods is p + Pb, so 
the expected number of all realised edges within all neighbourhoods is (p + Pb)S<n- It 
follows that, in order to maintain density to equal p, the expected number of realised 
edges outside the neighbourhood ought to be ( 2 )p — (p + Pb)S<H- As a result, 



2 Since the probability P(xij = l\x) is bounded by and 1, 0<p + p&<l and 
< p — A < 1. These conditions defines upper and lower bounds for pb given a p and 

2 If the expected number of nodes within the neighbourhood is the same across all nodes (e.g. 
when we have a homogeneous point process with periodic boundary condition), then let the expected 




(2) 




(3) 





5 



H. We note that at the a particular set of values where p = [(^) — S<(x)] I (^) an d 
Pb = 1 — p, i.e. when there is probability 1 that all pair of nodes less than distance 
H apart will be joint and probability otherwise, this model becomes the so-called 
random geometric graph model jSZl 3 - A typical instance of spatial random graph can 
be found in Fig. 

It is convenient to express our model in the exponential form which fits into the 
exponential random graph network modelling framework. More details of the frame- 
work can be found in Wasserman and Pattison E3] ■ The modelling framework has 
recently received renewed attention in the physics community, see |B| and [31] . Let x 
be an instance of random graph with spatial locations of nodes specified by \. Then 
the general form of the probability function for obtaining x is given by 

P(X = x\ X ) = exp [-H(x\ X )\, (6) 

where TC(x\x) is the Hamiltonian of graph x given the locations of nodes specified by 
X and 

Z( X ) = ^exp[-^0r|x)] (7) 

X 

is the partition function of the model. The Hamiltonian can take any (sensible) form 
to reflect the dependency of edges. The simplest choice is —7i(x) = 9qL(x\ X ), where 
L(x\x) = L(x) is the number of edges in the graph x independent of x an d 9 is its 
associated parameter. By tuning 6> , we can change the expected density of a typical 
graph in the model. This gives us the classical Erdos-Renyi random graph model 
as introduced in Section 1. To define Ti{x\ X ) for our simple spatial model, we first 
define L<(x\ X ) = V, j;d ... n .r,, and L>(x\x) = E*. . H X H be the number of edges 
shorter and longer than H respectively, then the Hamiltonian for our model can be 
written as 



-H{x\ X ) = 6<L<(x\ X ) + e^xlx), (8) 

where 6< and #> are parameters whose values can be calculated directly from Eq. |2J 
i.e. 



9< = logit(p + p b ), and 9 > = logit(p - A), (9) 

where logit q = log[g/(l — q)}. Equivalently, we can also write the Hamiltonian in 
another form: —TC(x\ X ) = 9L(x\x) + 9' L<(x\x) , where L(x\ X ) = J2i<j x ij ( = L<(x\x) + 

number of nodes within a node's neighbourhood (not including itself) be E[s<h], we can then repeat 
the above and arrive at a simpler form for the correction term, 

A E[s <H ] 

TPb- (5) 



(N-1)-E[s<hY 



3 We also later become aware of a generalisation of the random geometric graph model called the 
random connection model where the edge probability is taken to be a general decreasing function of 
spatial distance, g [HSi- In HHh some rigorous results regarding the giant cluster on the model has 
been obtained, however, the main variable of their model is the Poisson rate p in space while keeping 
g general. However here our focus on the function g. 
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L > (x\x)), 6 = 6> and 9' = 9< — 9 > . In the following however, we shall only use the 
former form of the Hamiltonian. 

The simplicity of the Hamiltonian allows us to write down the partition function 
in closed form, 

Zn{x) = 5Z exp = ^exp (9<L<(x) + 9 > L > {x)) 



= Xl exp (^ x ij +9 > 

x \ i<j,dij<H i<j,dij>H 

= (l + e ^) %(x) (l + e e >)®" %(x) . 

Using this explicit form, one can double-check the constant density in our model for 
all pb and H given x' 

^) \l{x\ X )) = (Tj 1 ^Y,i L <( x \x) + L>(x\x)}ex P [-H(x\x)} 

NY 1 1 fdZ( X ) + dZ( X )\ 



2 J Z( x ) V d0< 89 > J 

N\~ x 

[S<(x) (P + Pb) + S> ( X ) (p - A)] = p, 



which is what we expected. In theory, we can calculate any statistical average statistical 
quantities (Q(x\x)), for example the average clustering coefficient etc, from Z(x) by 
adding an auxiliary term in the Hamiltonian A7i(x) = uQ{ x \x)- Then, 



(Q{Ax)) = -^^Qexpl-HixM-yQixlx)] (10) 

(11) 



z(x) 

1 dZ( X ) 



Z(x) dy 



However, as is the case in many other statistical mechanics models, equations like Eq. 
ITT1 are very difficult to evaluate exactly as a general approach in not yet available. 
In this study, we resort to using numerical simulations to explore the properties our 
model. 



3 Simulation results and discussions 

The Poisson rate p and the neighbourhood radius are relative to each other, so we shall 
always fix p = 1 and vary H only. On the other hand, the choice of the value of H is 
not crucial as long as H is sufficiently large. When H is too small, the vast majority 
of ties connected to a node are inevitably from the outside of the neighbourhood, 
therefore, the model behaves as the simple Erdos-Renyi random graph model. Here we 
fix H = 3/2. The main program of simulations below is to vary the proximity bias pb 
and investigate the effects on the overall structures of the graphs. We used a Markov 
Chain Monte Carlo method outlined in Snijders |Hj for all our simulations. Each 
individual data point below is a result of a simulation run (250, 000 Markov iterations) 
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of random graphs with 100 nodes on a fixed x- The burn-in phase for each run is 
about 30, 000 iterations and statistics from this phase were removed before further 
analysis. The estimates of the statistics are then calculated as the simple averages in 
the post-burn-in phase. And we collect the estimates for six realisations of x- 

3.1 Number of short and long edges 

In Fig. El we plot the average number of edges shorter than ((L<(x))) and longer than 
((L>(x))) the neighbourhood radius; we called them short and long edges respectively. 
When pi, = 0, the opportunity tie pool of each node equals the entire population 
(except itself). In this case, since there are many more potential long edges than 
short edges in the graph, the graphs are on average dominated by long edges. As 
Pb increase, the potential tie pool of each node concentrate more and more on the 
population geographically close, so short edges dominate. The two scatter plots in Fig. 
H2 clearly display linear opposing trends. The linearity is simply a result of our model 
definition, refer to Eq. 121 Also as pb increases, there is growing variance of statistics for 
fixed pb- It is because as the difference between short edge probability and long edge 
probability grows, the graphs are more and more dependent on the configuration of the 
specific instance of the point process. However, as pb approaches 1 — p, the variability 
decreases again. It is because the large pb values place significant constraints on the 
feasible instances of the point process and thus only a small number of instances x are 
feasible for large pb (see Eq. • 

3.2 Small-world properties 

Let us now investigate the effect of pb on the global structure of the graphs. First 
let us define the geodesic distance, or graph distance, between two nodes. Note that 
this distance is independent of the spatial distance between the nodes involved. Given 
a graph X = {:%}, let the geodesic distance between Vi and Vj, kj(X), be length of 
the shortest self-avoiding path connecting Vi and Vj. If Vi and Vj are disconnected, we 
assign 1^ = oo. Now, given a x, define 



to be the (harmonic) mean of hj{X) over all possible pairs on nodes in X. w(X) is 
a simple measure of connectivity of G. Refer to the upper part of Fig. 0] for a plot 
of (w(X)) against pb- From the plot, (w(X)) remains small for a large range of pb- 
Then there appears to be a critical pb, as in the Watts-Strogatz model |Hj, that w 
dramatically increases. This indicates a switch from the simple random graph regime, 
where "short-cuts" between neighbourhood are abundant, to the random geometric 
graph regime, where most edges are within each actor's neighbourhood. 

Further, we study the level of clustering in the model. Let ti(X) be the number 
of independent 3-cycles, or triangles, in graph X, i.e. ti(A) = J2i<j<k x ij x jk x ik- Also 
let s 2 (X) be the number of 2-stars in graph X, i.e. S2PO = J2i<j<k x ij x ik- Then we 
define the global clustering coefficient to be 




(12) 



C(X) 



3ti(X) 



(13) 
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C(X) measures the overall level of clustering in a network or in other words how 
much on average an actor's friend's friends are also the actor's friends. By definition, 
< C(X) < 1 for all X. Refer to Fig. HJfor a plot of the average (C) over all X in 
the ensemble as we changes pb- This model also display similar behaviours as in the 
Watts-Strogatz model, and there is a steady increase in its value as pb increases. The 
source of this clustering is entirely spatial, i.e. triangles are likely to be formed simply 
because of the fact that the involved nodes are closed to each other spatially. 

Overall, there is a range of pb where the model display simultaneously relatively low 
average path length and significant level of clustering — a signature of a "small-world" 
model. 



3.3 Community structures 

Although the clustering coefficient defined above measures the overall level of clustering 
in a graph, there is the question of whether clustering is distributed evenly over the 
entire graph on average, in other words, do the triangles in the graph tend to clump 
together or not? The "dumpiness" of triangles can be measured by the number of 
higher- order triangles. A 2-triangle is defined to be the combination of two triangles 
sharing a common edge (which is called the base edge). In general, a A;-triangle is 
defined to be the combination of k triangles all sharing a common base edge and let 
tk(X) its number in X. Let gij(G) be the number of two-paths connecting Vi and 
Vj, then a useful and convenient way to combine all tk{G) into a single measure is as 
follows: 



t,(o = 3il (G)-^ + ^-... + (-ir 3 %# (w) 



Kj 



Xij 



(15) 



where A is an arbitrary constant. For a detailed discussion of the motivation for this 
definition, see |35] . A plot of (T\(X)) for A = 2 against pb can be found in Figure 
E| There we can see a steady increase in T 2 (X) as pb increases. This suggests that 
graphs have higher tendencies on average to form clumps of triangles — we call these 
clumps communities - - for large pb- Inspection of instances of graphs when pb is 
large suggests that the delineation of communities are determined by large gaps in the 
spatial distribution of the nodes. This happens in spite of that fact that the nodes 
are distributed uniformly randomly in space. This phenomena can be compared with 
the observation that communities are likely to be separated by large streets, railroad 
tracks, etc [23] . 

Now that we have shown that a typical graph in the model is likely to have strong 
community structures when pb is reasonably large. One possible implication of having 
such kind of structures is that the whole graph can be composed of disjoint commu- 
nities. To detect whether it is the case, we consider the average size of the largest 
connected component over the ensemble. We define the component size containing 
node Vi, 7i(A), to be the number of nodes with finite path length from Vi (including 
itself), i.e. 7i(G) = Y2i<j ^(k? < °°)> where 5 is the normal delta function. Let 

T(X) = maxy Vi (X) (16) 
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to be the number of nodes of X in largest component over all Vi. We have plotted the 
(T(X)) against pb in Fig. El When pb is small, a vast majority of nodes is contained 
in the largest component. As pb increases, there is a higher chance that the largest 
component no longer contains most nodes, leaving a significant proportion of nodes in 
smaller isolated components. The instance of network in Fig. El is an example of such 
a situation. 

3.4 Actor degree distributions 

Fig. [7| shows the average degree distribution of graphs for various values of pb- They 
are all positively skewed and as pb increases the corresponding degree distribution 
has an increasingly fatter tail. As pb becomes very large, the distribution becomes a 
bimodal distribution. This is due to the boundary effect where nodes near the spatial 
boundary of the graph are disadvantaged by having less possible neighbours within 
their neighbourhood radius. This is confirmed by the studying the correlation between 
the spatial distance of nodes from the centre of a typical graph and the degree of the 
nodes. A scatterplot for a typical instance can be found in Fig. |H1 and in this case it is 
found that there is a significant correlation between the two (p < 0.01). 

4 An office communication network 

We can use our current model to gain insights into the underlying dynamics of a social 
network. Our example here is a communication network of 33 individuals observed 
over two days in a single-floor office as part of a large organisation in Australia We 
define a communication tie to exist between two individuals if and only if each of the 
two parties has sought information from each other more than three times over the two- 
day period. This definition is to avoid brief idiosyncratic encounters that create much 
noise on top of the regular communication pattern. The network is depicted in Fig. 
El The location of the nodes in the figure are the actual locations of the individual's 
cubicle or room up to a linear scaling. In particular, the dimension of the office space 
is scaled so that all relevant locations fits into a 1 x 1 unit square. 

The network displays the generic features introduced in Section 1.2. First of all, 
the density is low (p = 0.131). The degree distribution is (slightly) positively skewed 
(skewness statistics = 0.303). The clustering coefficient is high (0.389) while the median 
geodesic distance is very small (2). Also, in a spatially rearranged layout of the network 
in Fig. ^2 one can easily identify the two distinct communities. 

Indeed, one can reasonably conjecture that spatial process is important in this 
communication network. To verify this, we use our current model and estimate the 
model parameters, i.e. p, H, and pb (see Eq. |2J). The empirical edge probability p(d) 
is plotted in Fig. [TU1 It displays the expected big drop at the small values of d. Based 
on this, we can fit p(d) with a simple step function with some neighbourhood size H 
such that the sum of squared errors is minimal. In this case, H is found to be 0.2. As 
the overall density p is 0.131, the proximity bias pb is estimated to be 0.259, and the 
correction term A is estimated to be 0.047 4 . The large bias pb indicates that there is 
a strong spatial component in the underlying social process. 

4 Note that one can alternatively derive the value of A using Eq. 0] 
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5 Conclusion 



In this study, we have performed Monte Carlo simulations on a class of spatial random 
graph models. It has been found that the properties of these models differs significantly 
from the simple Erdos-Renyi random graph model. In particular, for a range of pb 
values, the model displays simultaneously many general features of social networks 
as discussed in Section 1 while the Erdos-Renyi random graph model fails in many 
aspects. 

We note that the properties of the model are similar to those of the well-known 
Watts-Strogatz "small world" model [UJ but that the current model has the advantage 
of specifying an explicit probability distribution over the collection of all graphs with 
a given number of nodes. It is also important to note that the clustering properties 
of the current model arise entirely from the Poisson process describing node locations 
and hence from a form of spatial baseline homophily. It is clearly an empirical question 
whether such models for social networks provide an adequate descriptive account or 
whether it is necessary to also incorporate inbreeding homophily effects (as in exponen- 
tial random graph selection models 42 J or endogenous network processes characteristic 
of the Watts-Strogatz model [UJ and more general exponential random graph model 
specifications [KJ EE EHJ E3 UH] • This is a question that has received little attention 
in the network literature despite its fundamental importance to our understanding of 
network evolution. 

In order to identify models that provide a good match to empirical data, it will 
be useful to construct a nested family of exponential random graph models that can 
be used to evaluate the empirical evidence for spatial and other forms of baseline ho- 
mophily, inbreeding homophily and endogenous clustering effects. Indeed, reference 
to empirical data immediately raises the possibility of at least two alternative concep- 
tualisations of a spatial model: the first, a geographical space, in which geographical 
coordinates are associated with each node; and the second, a more abstract "social" 
space, in which spatial proximities reflect baseline homophily across a broad range of 
individual attributes. In each case, spatial locations may be observed or unobserved. 
For the current model and the case of observed locations, it would be necessary to 
estimate the model parameters p and H from the combination of location and net- 
work data; in the case of unobserved locations, it would be desirable to use a version 
of Hoff, Handcock and Raftery's JH| more general exponential random graph model 
specifications to estimate model parameters from network data alone. More generally, 
it would be desirable to construct within the exponential random graph model family, 
models that also include inbreeding homophily and endogenous clustering effects and 
associated estimation methods. 

The model described in this paper takes a useful first step towards the construction 
of such a family of models. The results presented here suggest that the fit of models 
to empirical data will need careful quantitative evaluation (e.g. as in |B^1 145j) because 
of the likely capacity of many models within the family to exhibit in broad terms the 
commonly observed features of empirical social networks laid out in Section 1.2. 
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Figure 1: The alliance network in Eastern Central Highlands of New Guinea. Note 
that the numbers next to the nodes are for illustrative purpose only and the spatial 
arrangements of the nodes do not reflect the actual tribe locations. 

[54] Zipf G.K., Human behaviour and the principle of least effort, Addison- Wesley 
(1949). 
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Figure 2: A 100-node graph (p = 0.05, pb = 0.95, and H = 3/2) taken from the 
Markov Chain Monte Carlo simulation in Section 3. Points are distributed according 
to a Poisson point process. 
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Figure 3: A plot of average number of short edges (L<(x)) and average number of 
long edges (L > (x)) versus pb- 
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Figure 4: A plot of average geodesic distance (L(X)) (upper graph) and average 
clustering coefficient (C(X)) (lower graph) against p&. 
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Figure 5: A plot of average fc-triangle statistics (T\(X)) [45 j for A = 2 against 
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Figure 6: A plot of average largest component size (r(X)) against p^. 
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Figure 7: Average degree distribution for various values of pb- 
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Figure 8: Scatterplot for the degree of nodes versus distance from the centre of the 
graph in one typical instance of the model (N = 100, p = 0.05, pb = 0.8). The line is 
the least square fit line. 
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Figure 9: An office communication network of 33 individuals jSj. The location of the 
nodes reflects the cubicle or room locations. Note that the "self-loop" in fact indicates 
a link between two individuals who share the same office space. 
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Figure 10: A plot of edge probability p(d) against distance d in the communication 
network. The solid bars are the empirical edge probability and the dashed line is the 
fitted step function with neighbourhood radius set at H = 0.2. Note that the distance 
has been scaled so that all node fits into a 1 x 1 square. All pairs of nodes are less than 
distance 1.1 apart. 
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Figure 11: A spatial re-arrangement of the office communication network. 
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